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Abstract 

We consider the cascade and triangular rate-distortion problem where side information is known to the source 
encoder and to the first user but not to the second user. We characterize the rate-distortion region for these problems. 
For the quadratic Gaussian case, we show that it is sufficient to consider jointly Gaussian distributions, a fact that 
leads to an explicit solution. 
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I. Introduction 

Yamamoto [T| considered the cascade source coding problem, where a source sends a message to User 1, and 
then User 1 sends a message to User 2. In this paper, we extend Yamamoto's cascade source coding problem to 
the case where side information is known to the source and to User 1, but not to User 2. The problem is depicted 
in Fig. Q] 
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Fig. 1. A cascade rate distortion problem with three nodes (encoder, User 1, User 2), where the first two nodes have side information Y. User 
1 and User 2 need to reconstruct the sourse X, within distortion criteria. 



More recendy, Vasudevan, Tian and Diggavi (2) considered the cascade source coding problem, where side 
information, Y, is known to the source encoder and to User 1, additional side information Z is known to User 2, 
and the Markov chain X — Z — Y holds. Vasudevan et al. [2j provided an inner and an outer bound and showed that 
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TABLE I 

LITERATURE OVERVIEW OF CASCADE SOURCE CODING WITH SIDE INFORMATION AS SHOWN IN FlG.[2] 
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the bounds coincide for the Gaussian case. Cuff, Su and El-Gammal considered the cascade problem where the 
side information is known only to the intermediate node and provided an inner and an outer bound. An additional 
related problem, which was considered and solved in |4|, is that of cascade source coding when side information 
is known to all nodes with a limited rate. Table Q] summarizes the literature on cascade source coding with side 
information. 
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Fig. 2. A cascade rate distortion problem with several options of side information. Table [I] summarizes the lietrtaure on this problem. 



Of special interest in lossy source coding is the Gaussian case with quadratic distortion, which in many source 
coding problems results in an analytical solution such as in the Wyner-Ziv problem (5] where side information is 
available to the decoder, the Heegard-Berger problem J6) where side information at the decoder may be absent, 
Kaspi's problem J7), JU where side information is known to the encoder and may or may not be known to the 
decoder, the multiple description problem J9), iflOl . the two-way source coding problem fiTI , the multi-terminal 
problem |[T2l ||T3l . the CEO problem lfl"4l - |[T6l . rate distortion with a helper ifTTl . ff8l . and successive refinement 
|[T9l and its extension to successive refinement for the Wyner-Ziv problem ||20l . 

Our main result in this paper is that the achievable region for the problem depicted in Fig. Q] is given by 
1Z(Di, D2), which is defined as the set of all rate-pairs (i?i,i?2) that satisfy 

R 2 > I(Y,X;X 2 ), (1) 
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Rx > I{X;X 1 ,X 2 \Y), 
for some joint distribution P(x,y)P(xi, X2\x,y) for which 

Edi(X,Xi) < A, i = l,2. 



(2) 



(3) 



An extension of the cascade source coding problem is the triangular setting |2TI . where there is an additional 
direct link from the source encoder to User 2. We solve this problem where side information exists at the source 
encoder and User 1, but not at User 2. 

The remainder of the paper is organized as follows. In Section HU we formally define the cascade problem and 
present the theorem establishing the achievable region. In Section [HI] we provide a converse and achievability 
proofs of the theorem, and in Section llVl we explicitly compute the rate region for the Gaussian case. In Section M 
we extend our result to the triangular case (cf. Fig. |5), and in Section [VI] we further extend the results to multiple 
users and discuss the corresponding empirical coordination problem. 



II. Cascade rate distortion: Problem definitions and main results 

Here we formally define the cascade rate-distortion problem where side information is known to the source 
encoder and to User 1. We present a single-letter characterization of the achievable region. We use the regular 
definitions of rate distortion, and we follow the notation of l22l . The source sequences {Xi £ X, i = 1, 2, ■ • • }, 
and the side information sequence {1^ G y, i = 1,2, • ■ • } are discrete random variables drawn from finite alphabets 
X and y, respectively. The random variables (Xi,Yi) are i.i.d. ~ P(x,y). Let X\ and X 2 be the reconstruction 
alphabets, and d, : X x X L — > [0, oo), i = 1,2, are single letter distortion measures. Distortion between sequences 
is defined in the usual way 
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Let Mi denote a set of positive integers {1, 2, .., Mi} for i = 1,2. 

Definition 1 (Cascade rate distortion code with side information at the first two nodes): An 
(n, Mi, M2, Dx, D2) code for source X and side information Y consists of two encoders 



and two decoders 
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The rate pair R2) of the (n, Mi, M2, Di, D2) code is defined by 

Ri = -logM i; i = l,2. (8) 
n 

Definition 2: Given a distortion pair {D\,D2), a rate pair (Ri,R 2 ) is said to be achievable if, for any e > 0, 
and sufficiently large n, there exists an (n, 2 nRl , 2 nR2 , Di + e, D2 + e) code for the source X with side information 
Y. 

Definition 3: The (operational) achievable region 1Z° (Di, D2) of cascade rate distortion is the closure of the 
set of all achievable rate pairs. 
Theorem Q] is the main result of this work. 

Theorem 1: For the cascade rate distortion problem with side information at the source and User 1, as depicted 
in Fig. Q] the achievable region is given by 

K {D 1 ,D2) = n{D 1 ,D 2 ), (9) 
where the region H(Di, D 2 ) is defined in (Q~|)-([3]). 

III. Proof of TheoremQ] 

Achievability: The proof follows classical arguments, and therefore the technical details will be omitted. We 
describe only the coding structure and justify why the indicated region is achievable. We fix a joint distribution 
P x Y ^ x 2 for which (0) holds, and an e > 0, and we show that there exists a code with rates 

R 2 = I(Y,X;X 2 ) + e, (10) 
Ri = I(X;Xi,X 2 \Y) + Ze, (11) 

complying with the distortion constraints. 

Generate randomly 2 ri ( 7 ( X ' y;X2 ) +£ ) codewords using an i.i.d. ~ P-% . Then bin the codewords into 

2 n(I(X;X 2 \Y)+2e) hins , n each ^ there ^ 2 n(I(X,Y;X 2 )-I(X;X 2 \Y)-e) = 2 n{I{Y;X 2 )-e) codewords . In addi . 

tion, for any typical sequences y n ,x^ generate 2™( / ( X;Xl l y ' X2 ) +e * 1 codewords using the pmf P(xi\y n ,x 2 ) = 

nr=l i X 1 |F,X 2 ( :£ l,il^' i 2, J )- 

The source-encoder receives the sequences x n ,y n and first looks for a codeword x 2 l that is jointly typical with 
x n ,y n . If there is such a codeword, the source encoder sends the index of the bin that includes this codeword to 
User 1. User 1 looks which codeword in the received bin is jointly typical with the side information y n . Since there 
are less than 2"( 7 ( y;X2 ) in the bin, with high probability only one codeword will be jointly typical with y n and it 
would be the codeword sent by the encoder. User 1 then forwards the codeword to User 2. 

Now we can think of a new problem where the source-encoder and User 1 have side information Y n ,X2 and 
hence a rate I(X; Xi\Y, X2) + e is needed to generate X" that is jointly typical with (X n , Y n , X2). Therefore, a 
total rate to User 1 of R x = I(X;X 2 \Y) + 2e + I(X; Xi\Y, X 2 ) + e = I(X; X u X 2 \Y) + 3e is needed, and an 
additional rate R2 = I(Y, X; X 2 ) + e is needed from User 1 to User 2. 
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Converse: Assume that we have an (n, Mi = 2 nRl , M 2 = 2 nR2 ,Di, D 2 ) code as in Definition!!] We will show 
the existence of a joint distribution P XY x 1 x 2 that satisnes CQ-©- Denote T\ = fi(X n ,Y n ) 6 {1, 2 nRl }, and 
T 2 = f 2 (T u Y n ) e {1,...,2"^}. Then, 

nR 2 > H{T 2 ) 

> I(X n ,Y n ;T 2 ) 

n 

i=X 
n 

( = } 5^ff(x i ,y i )-ff(x i ,y i |^a,i,r a ,x i - 1 ,y < - 1 ) 

i=l 
n 

> ^j(x,y ; je 2> i), d2) 

where equality (a) follows from the fact that the reconstruction at time i, X 2ti , is a deterministic function of T 2 . 
Now consider 
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where equality (a) follows from the fact that T 2 is a deterministic function of T\ and F™, and, similarly, equality 
(b) follows from the fact that Xi i and X 2 ,i are deterministic functions of (Ti,Y n ) and T 2 , respectively. 

The proof is concluded in the standard way by letting Q be a random variable independent of X n , Y n , uniformly 
distributed over the set {1, 2, 3, .., n}, and considering the joint distribution of Xq, Yq, Xi,q, X 2 ^q. For this joint 
distribution, inequalities ( TTZl i and ( fT3] l imply that ([TJ and (f2]i hold, respectively, and (0 implies that <[3j holds. ■ 

IV. Cascade rate distortion: the Gaussian case 

In this section we explicitly calculate the rate region 1Z(Di, D 2 ) for the cases where X and Y are jointly Gaussian 
and the distortion is the square-error distortion. The converse and the achievability in the previous sections are proved 
for the finite alphabet case, but it can be extended to the Gaussian case J5). 
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Our first step in finding the achievable region for the quadratic Gaussian case is to show that it suffices to consider 
only jointly Gaussian distributions P x Y g % in order to exhaust the rate region. Then we solve an optimization 
problem to find the achievable rate-region explicitly. 

Lemma 2 (Optimality of jointly Gaussian distributions): For the quadratic Gaussian cascade rate-distortion prob- 
lem with side information known to the source-encoder and to User 1, i.e., X,Y are jointly Gaussian and 
di(x, x\) — [x — xi) 2 , d 2 {x, £2) — (x — x 2 ) 2 , it suffices to consider only jointly Gaussian distributions P x Y Xl X2 
in order to exhaust the rate region TZ(Di, D2) given in Q)-®. 

Proof: Let us fix a point R2, D 1, D 2 ) in the rate region and let P XYXl x 2 t> e a joint distribution that 
satisfies (Q~|)-([3]l. Such a distribution must exist since Inequalities (HJ-® define the rate region (Theorem [T). Let 
K denote the covariance matrix induced by P x Y % x and let P x Y % g denote a normal joint distribution 
with mean zero and covariance matrix K. Now let us show that (Q]i-® also hold where the joint distribution is 
x y x x 2 - Inequality ® is automatically satisfied, since it depends on the distribution of [X, Y, X\, X2) only 
through the covariance matrix K. Consider, 

Ri > I{X;X 1 ,X 2 \Y), 

= h{X\Y)-h{X\X u X2,Y), 

h(X\Y) - h(X - (a 1 X 1 + a 2 X 2 + a 3 Y)\X 1: X 2 , Y), 



(a) 



0) 

> h{X\Y)-h{X-(a 1 X 1 + a 2 X2 + a 3 Y)) 

(c) 

> h(X\Y)-hp{X -(a 1 X 1 + a 2 X 2 + a 3 Y)) 



Jp(X;X 1 ,X 2 |y), (14) 



equality (a) is true for any set of scalars (ai, a 2 , a 3 ) and in particular if we choose those that are the linear estimator 
of X given Xi,X 2 ,Y. Note that the coefficients (a%, a 2 , 0:3) and the variance E(X — (a\Xi + a 2 X 2 + a^Y)) 2 are 
a function only of the covariance matrix K. Inequality (b) follows from the fact that conditioning reduces entropy, 
and (c) follows from the fact that, given a variance, the Gaussian distribution maximizes the differential entropy. 
The term Ip(X; X\, X 2 \Y) denotes the mutual information induced by the Gaussian distribution P x Y Xl X2 , and 
equality (d) follows from the fact that for the Gaussian distribution the error, i.e., X — (axXi + a 2 X 2 + a 3 Y), is 
independent of the observations Xi,X 2l Y. 
Similarly, we have 

Ra > I(Y,X;X 2 ) 

= I(Y;X 2 )+I(X;X 2 \Y) 

> Ip(Y;X 2 ) + I p (X;X 2 \Y), (15) 

where the last inequality follows from the same steps as (fl4l . ■ 
The next theorem provides an explicit expression for the Gaussian case. The proof is provided in Appendix [A] 
and is based on Lemma |2] and on solving an optimization problem with quadratic constraints and a linear objective. 
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Theorem 3 (Cascade Gaussian case): The rate region of the cascade source coding with side information at the 
first two nodes, where the source X and the side information Y = X + Z are jointly Gaussian distributed, where 
X and Z are mutually independent, and the distortion is quadratic, is given by 
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Fig. |3] depicts the regions for two specific values of D\ and D 2 such that it captures all four cases of Eq. (Til . 
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Fig. 3. The Gaussian quadratic rate region. The graph on the left hand side shows the rate region for the case where a\ = <r| = 1, D 2 = 0.35 
and Di = 0.4. Since D2 < a x\Y* rate re gi° n i s given by Cases (a) and (b) in Eq. The right hand side graph shows the rate region 
for the case where = a 2 z = 1, D2 = 0.65 and D\ = 0.5. Since D2 > a %\ Y ' tne rate re gi° n is given by Cases (c) and (d) in Eq. 1171 



Now, let us consider several extreme cases that can be easily solved using Theorem [3] 

1) Side information is independent of the source X _L Y: This means that o 2 x \ Y — a\ and a 2 z = 00. For such 
a case dl7l i becomes 

if D 2 < a\ and & < 2 2R2 < ^ 

if D 2 < a\ and 2 2R2 > ^ (18) 
if D 2 > a x , and 2 2R2 > 



°~x\w,y{D\,D 2 , R 2 ) 



°x, 
D 2 , 
oc . 



8 



and this implies that 



D 2 ,R 2 ) = i max I log log J , 



(19) 



recovering a result that appears in the successive refinement source coding paper 11191 . 

2 

2) Side information equals the source, i.e., X — Y: For this case, cr x ^ Y = 0; hence R\ = and 2 2i?2 > j^, 
consistent with the well known rate distortion function of the Gaussian source. 

then 



3) R 2 — !> oo: If D 2 < <J x \y 



and if D 2 > (J X , Y 



Ri(D!,D 2 , R2) = l max (log log ^ 



1 / °\\Y ^ 

R X {D X ,D 2 ,R 2 ) = - max I log -^-,0 



,0 , 



(20) 



(21) 



Note that for this case we can assume that the side information Y is known to all three nodes; hence only c x , Y 
is manifested in the expression. 

4) The message that User 2 receives depends only on the side information: In this extreme case, the rate R 2 and 
the distortion D 2 are large enough so that the message that User 2 receives depends only on the side information. 
This case is depicted in Fig. [4] 
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Fig. 4. An extreme case where the rate R2 and the distortions D2 are large enough so that the message that User 2 receives depends only on 
the side information. 



For this extreme, the rate region is simply 



i?! > I{X;Xx\Y), 
R 2 > I(Y;X 2 ), 



(22) 



for all joint Gaussian distributions that satisfy a 2 ~ < D\ and o~ ~ < D 2 . 

X\Y,X\ X\X2 



More explicitly, this region is given by 
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Indeed, if d23l l holds, then according to Theorem [3] R\{D\, D2, R2) = \ max ( log 



-,0 



V. Triangular source coding with side information 

In this section, we extend the cascade source coding discussed in previous sections by adding a direct link from 
the encoder to the second user, as depicted in Fig. [5] The definition of the code (n, Mi, M2, -M3, D±, D2) is similar 
to the one given in Def. [T]for the cascade case, with an additional message M3 at rate R3 sent from the source to 
User 2. 
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Fig. 5. A triangular rate distortion problem with three nodes (encoder, User 1, User 2), where side information Y is known to the encoder 
and User 1, but not to User 2. User 1 and User 2 need to reconstruct the sourse X to within distortion criteria. 



A. Main theorem and its proof 

Theorem 4 (The achievable rate region for the triangular case): The achievable region for the problem depicted 
in Fig. [5] is given by 1Za(Di, D2), which is defined as the set of all rate-triples (Ri, R2, R3) that satisfy 

Ri > I(X;Xi,U\Y), (25) 
Ri > I(Y,X;U), (26) 
R3 > I(X;X 2 \U), (27) 

for some joint distribution P(x,y)P(x\,X2,u\x,y) satisfying 

m^X.Xi) < D i: i = l,2, (28) 

where the cardinality of the auxiliary variable U may be bounded by \U\ < X\ \y\ \X\ \ \ X2 + 2. 

Lemma below shows that one can restrict the joint distribution P(x, y)P(xi, X2, u\x, y) to 
P(x,y)P(xi,u\x,y)P(x2\x,u) without affecting the region. 
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Proof of Converse Part of Theorem® Assume that we have an (n, 2 nRl , 2 nR2 , 2 nR3 ,D U D 2 ) code. We will show 
the existence of a joint distribution P X yux 1 x 2 that satisnes <G5])-(|28]). Denote Ti = f 1 (X n ,Y n ) G {1, 2 nRl }, 
and T 2 = f 2 {T 1 ,Y n ) G {1, 2 nIi2 }, and T 3 = f 3 (X n ,Y n ) G {1, 2 nR3 }. Then, 



nRi > 


H(Ti) 
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ff(T!|y n ) 




ffCTi.Taiy*) 
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I(X n ;T 1 ,T 2 \Y n ) 
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^ i E{X i \Yi)-H{X i \Y n ,T Xt T i ,X i - 1 ,X 1 

i=l 
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n 

^H{Xi\Yi) - H(X i \Y i ,X lii , Ui) 
»=i 




n 



1=1 



where equality (a) follows from the fact that T2 is a deterministic function of T\ and y n , and, similarly, equality (b) 
follows from the fact that X\ ; is a deterministic function of {T u Y n ) and from defining Ui = (T 2 ,X i ~ 1 ,Y i ~ 1 ). 
Now, consider 

nR 2 > H(T 2 ) 

> I(X n ,Y n ;T 2 ) 

n 

i=l 
n 

i=l 
n 

> ^7(X,y;^), (30) 

i=l 

where equality (a) follows from definition of Ui = (T 2 , X % ~ 1 , y i_1 ). In addition, consider 

nR 3 > H(T 3 ) 

> H(T 3 \T 2 ) 

> I(X n ,Y n ;T 3 \T 2 ) 

n 

= 5^fl-(x i ,y i |r a ,x i - 1 ,y < - 1 )-fl-(x i ,y i |T2,T3,A: < - 1 ,y < - 1 ) 

n 

( =' ^ H(Jfi, Yi\Ui) - y|l 2 ,i, (7,) 
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> ^I{X,Y;X 24 \Ui) 

i=l 
n 

> Y^I{X;X aii \Ui), (31) 

i=l 

where equality (a) follows from the definition of Ui = (T 2 , X % ~ x 1 Y 1 ^ 1 ) and the fact that X 2 j is a deterministic 
function of (T 2 ,T 3 ). 

The proof is concluded in the standard way by letting Q be a random variable independent of X n , Y n , uniformly 
distributed over the set {1, 2, 3, .., n}, and considering the joint distribution of Xq,Yq, Uq, Xi } q, X 2t Q. For this 
joint distribution, Inequalities ([29), (|30t , (l3TT > imply that (125) , ( |26] > and ( |27| > hold, respectively, and the fact that the 
code we have fixed satisfies the distortion constraints implies that (|28T l holds. 

To prove the cardinality bound of U, we invoke the support lemma |j23] pp. 310]. The external random variable 
U must have A"| |y| | |,Y2 — 1 letters to preserve P(x,y,x\,x 2 ) plus three more to preserve the expressions 
I(X;Xi,U\Y), I(Y,X;U), I(X;X 2 \U). Note that preserving P(x, y, x x , x 2 ) implies that Ed i (X,i" i ) < D t for 
i = 1, 2 is also preserved. ■ 

For the achievability part, we first establish the following: 

Lemma 5 (Optimality of X 2 — (X, U) — (Xl, F)J: The rate region TZ^(Di, D 2 ), which is defined by (l25T> - (f2Hl> . 
does not decrease by restricting the joint distribution to the form P(x, y)P(£i,u\x, y)P(x 2 \x, u). 

Proof: For a fixed (Di,D 2 ), let the rate-triple (Ri, R 2 , R3) £ R-a{D\, D 2 ). Then there exists a joint 
distribution 

P(x,y,u,x 1 ,x 2 ) = P(x,y)P(xi,x 2 ,u\x,y), (32) 

for which d25ll-(l28ll hold. Let P(xi, u\x, y) and P(x 2 \x, u) be the conditional distribution induced by 
P(x,y,u,xi,x 2 ). We now claim that (I25b-(|28]) are satisfied under the joint distribution 

P(x,y,u,x 1 ,x 2 ) = P(x,y)P(x 1 ,u\x,y)P(x 2 \x,u). (33) 

This is true, since the expressions (|25]>-(|28]| depend on P(x, y, u, x\,x 2 ) only through the marginals P(x, y, u, X\) 
and P(x,u,£ 2 ). Now notice that those marginals are the same whether the joint distribution is P(x,y,u,xi,x 2 ) 
or P(x,y, u,xi,x 2 ). ■ 
Sketch of proof of Achievability part of Theorem @- The achievability proof follows directly from the 
achievability of cascade source coding as given in Theorem Q] First, we fix a joint distribution of the form 
P(x,y)P(xi,u\x,y)P(x 2 \x,u,y) such that <[25]i-<|28]i hold. Since R ± > I(X; X u U\Y) and R 2 > I(Y,X;U), 
then according to Theorem Q] we can generate (X™, U n ) that with high probability would be jointly typical with 
(X n ,Y n ) according to the distribution P(x, y)P(xi, u\x, y). Now, since U n is known both to the encoder and 
to User 2, we need a rate i? 3 > I(X;X 2 \U) to generate Xg such that with high probability it is jointly typical 
with X n , U n . Finally, because of the Markov relation X 2 — (X, U) — (Xi, Y), we can invoke the Markov lemma, 
and conclude that the sequences X n , Y n , X™, , X£, U n are jointly typical and therefore the distortion criteria are 
satisfied. ■ 
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B. The Gaussian triangular case 

We now evaluate the rate region of the triangular network depicted in Fig. [5] for the quadratic Gaussian case, 
i.e., X,Y are jointly Gaussian and d\{x, x-i) = (x — ii) 2 , d 2 (x,x 2 ) = (x — x 2 ) 2 ■ We first show that it suffices to 
consider only Gaussian joint distributions for exhausting the region, and then we show that by a small change in 
the Gaussian cascade region we obtain the Gaussian triangular region. 

Theorem 6 (Optimality of jointly Gaussian distributions): For the quadratic Gaussian triangular rate-distortion 
problem with side information known to the source-encoder and to User 1, it suffices to consider only jointly 
Gaussian distributions P X yuXi x 2 * n orc ' er to exhaust the rate region li^Dx, D 2 ) given in d25l)-(|28l>. 

Before proving the theorem, let us introduce the Pareto frontier l24l of a region and show that if two rate-regions 
have the same Pareto frontier then they are identical. The Pareto frontier of a region 1Z, which we denote by 
Par(7Z), is the set of all points for which there is no strictly better point in the region. Formally, 

Par(K) = {R n G K : $R n 6 K s.t. R n -< R n }, (34) 

where R n -< R n denotes that Ri < Ri for all 1 < i < n and for some 1 < i < n, Ri < Ri. 

Lemma 7: If two rate-regions, IZi and 1Z 2 , have the same Pareto frontier, then they are identical. 

Proof: Let us show that the assumptions R G IZi and R £ IZ2 lead to a contradiction. If R £ IZi, then there 
exists a point R P G Par(TZ\) that satisfies R p -< R. Since R P G Par(lZi), it follows that R p G Par(1Z2}- Finally, 
since R p G 7Z 2 and R p -< R, then R G IZ2, which contradicts the assumption. ■ 

Proof of Theorem® As a result of Lemma [7] we conclude that it suffices to prove Theorem [6] only for the points 
in the Pareto frontier. In addition, we notice that points that are Pareto optimal satisfy (|25||-(|27|| with equality, which 
may be also written as 

Ri = I(X;X U U\Y), (35) 
R 2 = I(Y,X;U), (36) 
R3 + R2 = I(Y,X;X 2 ,U). (37) 

Finally, assuming without loss of generality U is real-valued and using similar arguments as in Lemma [2] we 
conclude that for any joint distribution P x Y jt x x 2 u tnere exists a Gaussian joint distribution, P x Y Xi x ^ u , with 
the same covariance matrix as P x Y Xi X2 jj, for which the induced right hand sides of d35ll-(l3~7"li do not increase. ■ 

Now, with a small change in the solution to the Gaussian cascade, we obtain the triangular Gaussian region. The 
proof is deferred to Appendix [B] 

Theorem 8 (Triangle Gaussian case): The rate region of the triangular source coding with side information at 
the first two nodes, where the source X and the side information Y = X + Z are jointly Gaussian distributed, 
where X and Z are mutually independent, and the distortion is quadratic, is given by Eq. dT6b-(fT7l>. where D 2 is 
replaced by D 2 2 2Ra i.e., Rf lari9le (D u D 2 ,R 2 ,R ?t ) = R c l ascade {D l ,D 2 2 2R \R 2 ). 



13 



VI. Extensions 

Here we present two further extensions. The first is obtained by generalizing the triangular network results to 
more users. The second is obtained by considering a more general problem of empirical coordination rather than 
distortion criteria. 

A. Multiple Users 




Fig. 6. A triangular rate distortion problem with k + I users, where the side information Y is known to the encoder and to Users 1, 2, k, 
but not to Users k + 1, k + 2, k + l. 

The triangular problem depicted in Fig. [5] can be extended to k + I users, where the side information is known 
to the source encoder and to Users 1, 2, k, but is not known to Users k + l,k + 2,...,k + I. This problem is 
depicted in Fig. [6] and its region is given by the next theorem. 

Theorem 9: The achievable region for the problem depicted in Fig. [6] is given by the vector rates 
(R ll R 2 ,...,Rk+i+i) that satisfy 

Ri > /(X;Xi,X i+1) ...,i: fe+i _ 1) ?7|F) ) 1 <i< k 

Rj > I(X;X J ,...,X k+l - 1 ,U), k + l<j<k + l 

Rk+i+i > Xk+i\U), 

(38) 

for some distribution P(x, y)P(x\,X2, ...,Xk,u\x,y) for which 

Edi(X,Xi) < A, l<i<k + l. (39) 

where the cardinality of the auxiliary variable U may be bounded by \U\ < |A , ||3^||^i||<-^2|---|< ; tA;+i| + k + l. 
The proof of Theorem [9] follows similar steps as the proof of Theorem |4] and is therefore omitted. 
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B. Empirical coordination 

In ll25ll . two coordination problems were introduced: Empirical coordination, where the goal is to generate 
sequences with a specific empirical distribution, and strong coordination, where the goal is to generate sequences 
with a distribution that is close (in total variation) to a specific i.i.d. distribution. The empirical coordination problem 
is a generalization of the rate distortion problem, since a distortion constraint defines a half-plane in the empirical 
distribution space. Hence, if we find the optimal rate needed to generate a specific empirical distribution, we also 
find the optimal rate needed to generate a specific distortion constraint. 

For the cascade rate distortion problem with side information at the first two nodes, the extension to the empirical 
coordination problem is straightforward. 

Theorem 10 (Rate coordination in the cascade problem): The rate coordination region Rp a {P{xi 1 X2\x,y)) of 
the cascade problem where side information is known to the first two nodes, where X, Y ~ Po(x,y), and an 
empirical distribution P (x,y)P(xi,X2\x,y) is desired, is given by 

R 2 > I(Y,X;X 2 ), 

i?x > J(X;Xi,Xa|y), (40) 

where the joint distribution evaluating the mutual information expression is P (x,y)P(xi,X2\x,y). 

Proof: The achievability proof follows immediately from the achiev ability proof of Theorem Q] where we fixed 
an empirical distribution and showed that it can be achieved using the above rates. The converse also follows from 
the converse of Theorem [T] but in the last step we need to invoke l25l Proposition 2], which states that the expected 
empirical distribution equals the distribution of the random variables chosen uniformly over the time sequence 



1,2,..., n, i.e., E 



Xq ,Yq ,X lt Q ,X 2 ,Q 



However, the triangular coordination problem is an open problem, even without side information. The solution 
here is heavily based on the fact that in the achievability proof it suffices to consider only a specific empirical 
distribution (with a Markov structure), but for an arbitrary distribution the coordination problem remains open. 

Appendix A 
Proof of TheoremO 
Following Lemma [2] we can rewrite the rate region for the Gaussian case as: 

R2 > I(Y,X;W), (41) 

R t > I(X;V,W\Y), (42) 
where the vector (X, Y, V, W) is jointly Gaussian distributed and satisfies 

<?x\w < D 2 (43) 

°X\W,V,Y < D U (44) 

where a\ B 4 E[(A - E[A\B]) 2 ]. 
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Without loss of generality let us choose the following structure 

Y = X + Z, 

W = X + aY + Z 2 = (1 + a)X + aZ + Z 2 , 

V = X + pY + jZi + Z!, (45) 

where the random variables X,Z,Zi,Z 2 are jointly Gaussian and mutually independent, with variances 
p 2 x,<J 2 z,p'z 1 ,p'z 2 , respectively, and the coefficients (a, /3, 7) are real number scalars. 
Equations d42li-(l44li become 

Ri > I{X,Y;W) 



H(W) - H(W\X,Y) 

1 (1 + afa 2 x + a 2 a 2 + a z , 

-log - 

A a 



\ 0% 1 £ £l (46) 



<r 2 x (a 2 a 2 z +<T 2 z ) 



D 2 > &x\W = 7T - \T~ 2 — I 2 — 2 — ! 2" (^7) 

1 / cr^iy °x|y\ 
^1 = max lo S ~2 > lo § ~~ F> — ' ( 48 ) 

u 2 fy^?- j -2 -2,-2, -2 

where <4 |y = and cr x|wy = a Z2 + a x + a z . 

Inequalities d46b and d47b follow directly from (T4TT > and d43l >. respectively. Eq. d48b follows from combining the 
following two equations, d49b - ( TSUl l. If Z?i > & x \wy> t ^ ien *SH is automatically satisfied, and then 1/ is not needed 
(may be independent of anything else) and therefore 

Ri > I{X;W\Y) 

= H{X\Y) - H{X\Y,W) 

= H{X\Y) - H(X\Y,W) 
1. 



X\Y 

2 ~° a" 



lo gTr ^- (49) 



If Di < <j x \wy> then 



'X\W,Y 



R 1 > I(X;V,W\Y) 



= H(X\Y) - H(X\Y,V,W) 

1 a \\Y 

= 2 l0g ^f- (50) 

The last equality is due to the fact that we can choose (/3, 7, Z-y) such that p x \w,vy = ^l- 

2 

Now let us fix Di > 0, D 2 > 0, and R 2 > | log and let us find the function Ri(D±, D 2 , R 2 ), which defines 

cr 2 

the rate region. (The condition on R 2 is due to the fact that if R 2 < 5 log the rate will not be achievable for 
any R\). To find R\ we need to solve the following optimization problem 
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maximize a 



Z 2 



subject to (2 2R2 - l)cr| 2 > (1 + aj 2 a x + a 2 a 2 z 

a \J<Px ~ D 2)< a 2 {a 2 x D 2 + a 2 z D 2 - <J 2 x a 2 z ) + 2aa x D 2 + D 2 a 2 x 



(51) 
(52) 
(53) 



The objective ( Bit follows from the fact that R\ depends only on o\ 2 and d52l and d53l i follow from d46b and 
(l47T i. respectively. To solve this optimization problem, we divide the problem into four cases, where each case has 
a simple solution (each case corresponds to a line in (fTTIi). 

Case 1: For this case we assume that 



a 2 x D 2 + o 2 z D 2 - a 2 x a 2 z < =S> D 2 < 



ai + at 



T X\Y> 



and 



> 



lOS 



4(4-^2) 



a 2 z a 2 x - L> 2 o-| - D 2 a 2 x D 2 



(54) 



(55) 



Because of the assumption in (l73l , Eq. (l53l holds with equality, since otherwise a 2 z can be increased until it 
hits the boundary of i53i . 



Constraint Eq. d52t 




Fig. 7. Case 1: the maximum of c^ 2 > where both constraints hold, is obtained at the maximum of Eq. (53). 



The argument that achieves the maximum of a quadratic form aa 2 + ga + c is , hence the argument that 
maximizes d53l is 



(4£>2 + <7|£>2 - O x <7 2 z 



(56) 



and the maximum is 



4a 



jD 2 - ct^ct^ 



2 _2 



K - D 2 )(a 2 x D 2 + a 2 z D 2 - a 2 z a 2 x ) 
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(57) 



Note that d57l i can be also written as 



1 

D~2 



(58) 



'X 



If (a, <j\ ) satisfy Eq. d52i i. then the solution to the optimization problem is simply a\ and using ( |48T > we obtain 



1 



X\Y 



fl 1 = -max log ^ 



log 



'X\Y 



Now let us investigate when (a,a Zri ) satisfies Eq. (1521 (or equivalently d46*l l) 

R> 



> 

(a) 
(M 
(c) 



1, (1 +S) 2 cr| +a 2 cr| +ct| 2 



2 CT z 2 
1, <4(« 2 4+^l 2 ) 

75 log — =3-75 

1 w ct a-(« 2ct I + «4) 



log- 



aa z D 2 
a%{a x -D 2 ) 



'x 



D 2 a 2 z - D 2 a 2 x D 2 



'z"x 



where (a) follows from Equality (l47l i. (b) from (l57l i and (c) from 
Case 2: Assume that 



2 _2 



and 



fl 2 



< 



Do < 



log 



C75.0- 



<T 2 +CT 2 X 



'X\Yi 



a 2 (a 2 x -D 2 ) 



•x 



D 2 u% - D 2 a\ D 2 ' 



'z u x 

T 2 



(59) 



(60) 



(61) 



(62) 



Now if (160t is not satisfied, then the maximum of a Z2 should be on the boundary of the constraints, namely, both 
(l52l and (|53j should hold with equality. This is because the upper part of the intersection should be either increasing 
or decreasing. Such a case is shown in Fig. [8] 




5 2 



Fig. 8. Case 2: the maximum of cr\ , where both constraints hold, is obtained at the intersection of {52) and {53) 



Consider the case where ( |46l l and d47| i hold with equality. Then we obtain 

L 

>z 2 - 



2 D~ 2 ' 



which implies 



a 2 . 



2 2R *D 2 - a 



x 



Now substituting a 2 z given by d64l ) into 02] ) we obtain 



aV|a^(2^-l) 

— (1 + a J cr^ + a <7 



which simplifies to 



a 2 a 2 (a 2 x -D 2 ) 
D 2 - a 2 x 2- 2R i 



= (1 + a) 2 a 



x ■ 



Taking the square-root on each side of the equation we obtain two possible solutions for a: 



1 - + az 

a~ a x \l D 2 - a 2 x 2- 2R i 



a x -D 2 _ L 



(63) 



(64) 



(65) 



(66) 



(67) 



Since we need to maximize a 2 z , which is proportional to a 2 (see Eq. d64li). we choose the solution with the plus 



sign. 

Case 3: Assume that 



and 



'z u x 

2 2 



D 2 > 



'X\Y' 



X 



(68) 



u > l 1ng °%(° 2 x-D2) °x 
U2 ~ 2 l ° S a 2 * 2 x -D 2 a 2 -D 2 a 2 x D 2 



(69) 




Fig. 9. Case 3: the maximum of &% 2 , where both constraints hold, is obtained at infinity, since there is a infinite overlap between the constraints. 



If 



{o\D 2 



+ <r|£>2 - 
o\-D 2 



'x u z> 



> 



'X 



2 2R * - 1 ' 



(70) 
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which is equivalent to 



> 



'X 



then the maximum of a\ is obtained at infinity (as illustrated in Fig. [9), which implies that 



Ri = \ max ( 0, log -j^- 



= 2 l0g 



~dT' 



(71) 



(72) 




Fig. 10. Case 4: the maximum of <t^ 2 , where both constraints hold, is obtained at the intersection of )52t and i53\ . 



Case 4: Assume that 



and 



£> 2 > 



1 



'z u x 



^2 

a X\Y> 



'X 



R 2 < - log ■ 



2—44 



4(4-^) 



(73) 



(74) 



If (fTTl i does not hold, then the maximum of cr| should be at boundary of the constraint, namely, d52l and 
(l53l l should hold with equality. This is because the upper part of the intersection should be either increasing or 
decreasing. Such a case is shown in Fig. [TO] ■ 



Appendix B 
Proof of Theorem[6] 
Let us rewrite the rate region equations similarly to d42l-(l44li as, 

R 1 > I(X;V,W\Y), 
R 2 > I(Y,X;W), 
R 3 > I{X;W'\W), 
where the vector (X, Y, V, W) is jointly Gaussian distributed and satisfies 



(75) 
(76) 
(77) 



a X\W,W ^ D 2 



(78) 
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°X\W,V,Y ^ D i> ( 79 ) 

Without loss of generality, we may assume that X, Y, W, V have the same structure as in (l45l l and W' = X+?]W+Z' 
where Z' ~ N(0, cr|, is independent of X, Y, W, V. Furthermore, we note that we can assume that (|77| | holds with 
equality, since if not, we can change rj and Z' such that equality will hold, and the change will only decrease 
a x\ww ' therefore (l75l)-(|79l) will continue to hold. Now, the equality in ( ITTl i implies that 

a x\w,w = a X\W^ 2Ra - (80) 

Hence d78l becomes 

a 2 x]w <D 2 2 2R \ (81) 
Now we note that we obtain the same optimization problem as in 1461 -1 1481 . just that D 2 is replaced by £>22 2fl3 .B 
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